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This is an interpretive manual designed to accompany 
the Test of Proficiency in English as a Second Language, a 
comprehensive test assessing production and perception skills in 
written and spoken English and intended for use in Grades 4*6 in 
Bureau of Indian Affairs schools* Ihe manual is divided into three 
sections. Section one discusses English proficiency and the ways in 
which information from test results is best incorporated into 
decisions affecting individuals and Groups. Section two contains the 
information about TOPESL, lOPESL scores, and the norms population 
necessary for interpretation of scores and differences between 
scores* Section three contains detailed information about the 
development of TOPESL, and about the development of statistical 
information for TOPESL. Statistical data are presented in tables, and 
appendices list participating schools. (Author/AH) 
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IntrocJuction 



The Test of Proficiency in English as a Second Language (TOPESL) is a 
comprehensive test assessing skills in the production and perception of 
written and spoken English intended for use in grades four, five and six. 
The test is designed for use in Bureau of Indian Affairs* schools. It con- 
sists of three separately administered sections, English Structure (ES) , 
Listening Comprehension (LC) and Oral Production (OP). Accompanying the 
test itself are a separate administrative manual and this interpretive 
manual. The ac^niinistrative manual gives detailed information about the 
administration and scoring of TOPESL. This interpretive manual is divided 
into three sections. Section I contains a discussion of English proficiency 
and provides a discussion of the ways in which information from test results 
is best incorporated into decisions affecting individuals and groups.. Sec- 
tion II contains the information about TOPESL, TOPESL scores, and the norms 
population, which is necessary for interpretation of scores and differences 
between scores. Section III contains detailed information about the develop- 
ment of TOPESL, and about developing statistical information for TOPESL. 

For TOPESL as with any standard test, familiarity with basic testing 
concepts is a necessary qualification for the interpretation of scores,* 
Persons not familiar with the theoretical and practical limitations on the 
accuracy of test scores, tend to give too much significance to any obtained 
score. As will be noted repeatedly below, there is a margin of error in 
the use of any test results which cannot be ignored. For this reason, 
other pertinent information should be used in conjunction with test scores 
wherever possible. It should be noted, however, that although there is a 
non-negligible margin of error present in measurement with standardized 
tests, this margin is far smaller than it is for most other forms of assess- 
ing human performance. 



*Persons interested in obtaining additional information on these 
"basic testing concepts" should refer to texts such as: Lee J, Cronbach. 
Essentials of Psychological Testing (2nd edition). Harper 5 Row, 1960; 
Robert L. Ebel. Measuring Educational Achievement , Prentice-Hall, Inc., 
1965; Henry E. Garrett. Statistics in Psychology and Education (5th 
edition), Longmans, Green and Co., 1961; David P. Harris. Testing English 
as a Second Language , McGraw-Hill Book Co., 1969. In addition, some ex- 
tremely useful information on testing can be found in the test packet 
containing a series of brochures available free from the Educational Test- 
ing Service, Princeton, New Jersey. (An exceptionally useful brochure in 
the ETS packet is number 1, Locating Information on Educational Measurement : 
Sources and References which contains an annotated bibliography.) 
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English Proficiencv 

There are many kinds of skills that are frequently classified to- 
gether under the broad heading "English/* These include writing, spelling, 
punctuation, phonics, reading, and correct usage, among others. These are 
typically the aspects of the use of language which must be taught to chil- 
dren '^ho have learned English at horne as their first language. They are 
skills dealing for the most part with written standard English and the 
correspondences between the spoken and written forms of the language. These 
writing conventions are not the aspects of English with which TOPESL is 
primarily concerned. TOPESL is concerned with the knowledge of English 
structure, the way words go together to form sentences in English. It is 
thus not concerned with where commas go, or the distinction between *'lie*' an 
and "lay," or whether "English" should be capitalized. Rather, it is con- 
cerned with knowing, for exa'?»ple, that the answer to a *'what" question is 
usually a noun phrase and not "yes" or "no." 



Purpose 

The scores from these tests are intended to provide users with both 
placement and group diagnostic information. The information provided by 
the test should be considered an adjunct to the teachers* knowledge and not 
a replacement for it. Where in individual cases, interpretations of test 
scores yield conclusions greatly at variance with teachers' judgments, 
extenuating circumstances should be sought. Perhaps the child narked in 
the wrong section of the answer sheet, or was iir or worried about personal 
problems. 

Though TOPESL scores come from students' performance in three basic 
areas requiring a broad knowledge of English, TOPESL does not provide an 
exhaustive sampling of all aspects of English proficiency. Though vocabu- 
lary is doubtless an element in language proficiency, there is no specific 
vocabulary section to TOPESL. Similarly, though pronunciation is a notice- 
able aspect of spoken language, no attempt is made to assess this in TOPESL. 
These areas, though not assessed by TOPESL sections, are none the less 
important in overall language proficiency. Further, pronunciation assess- 
ment routines were not included in TOPESL because: (1) judging pronunciation 
deviations from standard English dialects is frequently extremely difficult 
for people who have not had specific training in phonetics and (2) "mispro- 
nounced" grammatically correct utterances seem to be less of a problem in 
comniunication than the respective statuses of encoder and decoder, situational 
context and the like. 
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Infcr-.i:ioa frctTi tests has no intrinsic value; it is uscfui oni/ as 
an aid in decision making. For test results to be best utilized, they ! 
should be part of the total input to a decision rule. A decision rule lis 
the process of choosing the kinds of infonnation to be used, detenninin'g 
the relative value to be assigned to each kind of infonnation, and finally 
specifying what is to be done with the infonnation. The advantage of ! 
decision rules is that they explicitly state what weight or value is td be 
assignei to each of the types of available infonnation. Decision rales can 
be formulated to operate with a minimum of infonnation, e.g., taking every 
person over eight years of age would be such a rule which could operate with 
minimal information. With increased amounts of information available more 
comple.x decision formulae are useful. For e.xa.-nple, in a given school where 
two levels of instruction in English structure are available to students 
from three classrooms, a hypothetical rule might be as follows: Add to- 
gether the standard score from each of the sections of TOPESL and divide by 
100. Add 4 points for an A, 2 points for a B, and 1 point for a C grade in 
English the previous year. Subtract one half point for each year in school. 
Give non-readers two points. Then exempt the top third from any English 
structure courses, put the second third in the upper* level, and the last 
third in the lower level of English structure. This rule is not offerred 
as a suggestion for actual use, but only to illustrate the statement of 
such a rule. Any actual rule must take into account local considerations, 
such as number of special classes available, consistency in grading, etc. 
Should a decision formula give what seem to be incorrect results, the kinds 
of information put in or the relative weight of the categories of informa- 
tion should be changed. 

The decisions using test results can be classified in several ways 
with respect to: (1) who makes the decisions--teachers , education special- 
ists, principals or supervisors; (2) where the decisions are made--in the 
classroom, in the school, or in the district; or (3) who is affected by the 
decisionS'-individuals or groups. Decisions involving individuals are 
usually made at the local level and involve placement, diagnosis of problems, 
and determination of whether individuals are performing to potential* Poor 
knowledge of English may be responsible for performance below potential or 
under-achievement. Decisions affecting groups may be local or non-local and 
involve deciding the number of courses needed, how much time should be de- 
voted in the curriculum, grouping strategies, the evaluation of programs and 
so on* 

Decisions affecting groups are in one sense simpler to make. Any test- 
ing instrument makes some misclassifications and, as discussed in greater 
detail below, these are quantified by the standard error of measurement. 
Since the misclassifications in general are normally distributed, with size- 
able groups they tend to cancel each other out. Because of this tendency, 
there need be less concern with error of measurement in decisions affecting 
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groups than in decisions affectiiig iniiividuals . Let us consider first, 
local decisions about groups, then non-local decisions a^cut groups^, and 
finally decisions about individuals. 

Locally-ir.ade decisions involving groups can be further separated to 
within ciassrcon and within school. The within school decisions (usually 
made by language arts specialists and principals in consultation with 
teachers) involve determining how many levels of English instruction are 
necessary, now riuch tine should be devoted to it and what strategies should 
be followed for grouping. Various grcuning strategies might be to put to- 
gether in the sa^.e classes, people of the san,e interest level, or the same 
cognitive level, or the saTie level in knowledge of English, Other informa- 
tion would be necessary for implementing interest level or cognitive level 
grouping, but TOPESL scores along with other information about English pro- 
ficiency would serve to group individuals according to their ability level 
in English, 

Ntuch the same kinds of considerations are involved in the within- 
classroom decisions made by the classroom teacher. If the school curriculum 
is not departmentalized, all the groups established will be taught by the 
same teacher, but the same criteria apply to making the decisions. 

Non-local decisions by supervisors and curriculum planners will deter- 
mine how much of the curriculum should be devoted to the study of, English 
structure and the extent to which teacher training should emphasize the 
teaching of English, Included will be the extent to which English special- 
ists need to be assigned to larger schools and available to smaller ones. 
Additional use of test information at the administrative level will include 
the evaluation of effectiveness of different English programs at various 
schools. This will entail establishing regular testing programs, and 
specific procedures to evaluate results obtained from them. 

Decisions involving individuals will almost always be made within the 
classroom or the school • Here, because the unit of focus is the individual 
rather than a group, misclassifications or discrepancies between obtained 
and ''correct'* or "true" scores cannot cancel out. The standard error of 
measurement (SE^) thus becomes of greater importance. By chance, a person 
with any given true score will have an obtained score differing from his 
true score by more than one SEj^^ about one in three times, and by more than 
two SEjQ about one in twenty times. On the test of English structure, where 
the SEjjj is 4.6 in raw score units, in general one out of three persons with 
a true score of 35 would get an obtained score higher than 39 or lower than 
31. Because of this uncertainty in individual score assignment, which is 
present in any test, care must be exercised in use of scores in individual 
placement. To facilitate use of the SEjj,, Section li reports the data for 
various grades in the usual percentiles and, also in intervals or bands two 
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Because of the range of error in indivicu.^i score assignrc.it, c-ii 
available and pertinent infor-ratioa in adJicicn to tost scores shciKiT'C 
incluiiec in the piacer.ent-J:-cision forruiae, such as, prcvicu? grades, 
teacher evaluations, scores on standardised tests, etc. Where three or 
niore independent sources of infor:iiation are available, errors in score 
assignment to individuals will tend to cancel out. 
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SEaiON II 



Section I of this interpretive n:anual proviJiid a statement of test pur- 
poses anc^ considers the use of test results in decision making. Section II 
provides a brief discussion of TOPuSL and the infonnation necessary to 
interpret raw scores. This information includes: reliability information-- 
hew consistent the test is in its score assigniTients to individuals; error 
of n;easuren:ent data--v:ith what degree of confidence an individual's score 
can be expected to fall in a given interval; validity information--:chat 
evidence there is that the test will actually accomplish the purposes for 
which it was designed; recommended uses of TCPESL scores; use of norms 
tables--how the performance of given classes of pupils is distributed; and 
a description of the population on which the norms were based. TOPESL con- 
sists of three basic t>'pes of testing instruments: (1) a written test of 
English structures; (2) a listening comprehension test; and (3) an oral 
production test. 

There are two parallel forms of the written test^ each of which con- 
tains sixty-two multiple choice items. One type consists of a question 
intern which can be answered with one of the choices; e.g., *MVhat does Tommy 
read in class?" ••(a) Yes, he does; (b) Likes books; (O School books ." 
The second type consists of an incomplete stem which can be completed with 

one of the choices, e»g., **The in this room is awful/' "(a) heat; 

(b) hot; (c) hotly/* 

The listening comprehension test consists of aural stimuli, recorded 
on tape, and three types of multiple choice responses: (a) choosing the 
correct picture of three which has been described on the tape; (b) identi- 
fying factual information which was actually given in a recorded conver- 
sation; and (c) using information contained in a recorded conversation in 
order to infer the correct choice. 

The third part of the test battery consists of an oral production test. 
In this test the student is shown sever;.! sets of picturcs--each set contain 
ing four pictures. Each picture in each of the sets varies slightly from 
the others along some criteria! attribute. The student is then shown a test 
picture which is identical to one of the four in the set. Two respoasen 
are required of the student. First he must point to the picture in the set 
which matches the test picture. Then he must tell the examiner how that 
particular picture differs from the others in the set. 

In order to aid teachers in evaluating the children's oral responses, 
and to standardize evaluation throughout all of the schools, a correction 
matrix was designed, (see Table A). On the far left hand side of the matrix 
is a series of grammatical categories. Each category represents a structure 
elicited by one of the sets of pictures. Seven require simple sentences and 
seven require complex sentences in order to describe the picture correctly, 
e.g., a simple response to one item is "The boys are washing their faces.** 
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TABLE A 
TEST OF P ROF ICIEiNCY IN ESL 

SCORE SHEET 
ORAL PRODUCTION TEST 



SCHOOL: 
DATE: 



EXAMINER: 

o 
E 












I 


Prsposition 


1 


1 


I 


• 

1 


1 


2 


Subject + Verb 


3 


3 


3 


3 


3 


3 


Subject - Object Differentiation: One boy - Another boy 


2 


2 


2 


2 


2 


4 


Plural Pronoun Agreement: They - Their 


4 


4 


4 


4 


4 


5 


Pronoun Gender Agreement: She - Her 


3 


3 


3 


3 


3 


5 


Fluency 


4 


4 


4 


4 


4 


7 


Present Progressive Tense: Be + ing 


1 


1 


1 


1 


1 
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Article Presence: A / The 


1 


1 


1 


1 


1 


Complexity 


4 


4 


4 


4 


4 


9 


Plural Noun: Their Books 


2 


2 


2 


2 


2 


Complexity 


4 


4 


4 


4 


4 


10 


Count / Mass Noun: A Letter / Mail 


2 


2 


2 


2 


2 


Complexity 


4 


4 


4 


4 


4 


11 


Fluency 


2 


2 


2 


2 


2 


Complexity 


4 


4 


4 


4 


4 


n 


Present Progressive Tense: Be + ing 


2 


2 


2 


2 


2 


Complexity 


4 


4 


4 


4 


4 


13 


Verb Tense: Fall 


2 


2 


2 


2 


2 


Complexity 


4 


4 


4 


4 


4 




Quantifier: Many / A Lot Of 


4 


4 


4 


4 


4 


Complexity 


3 


3 


3 


3 


3 


o ^ ^. TOTAL SCORE : 
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A response using a ccniplex sentence, as in the second half of the oral 
production test is, 'The girl is watching the children read their books.'* 
Aion^ the rows opposite each category is a number from one to four which 
the teacher crosses out if the response is wrong or leaves alone if the 
answer is right. For each subject tested, the teacher simply adds the 
column of numbers beneath the child's name, which have not been crossed out. 

The different gra.Tjnatical categories are assigned nunbers ranging from 
one to four because previous administrations and statistical analyses of 
the pre-tests showed that certain categories are more predictive of success 
or failure on the total test* The most predictive items are scored four 
points and so on down to the least predictive items which are scored one 
point only. All the teacher has to do is listen for one specific grar^mati- 
cal aspect, e.g., in item four plural pronoun agreement, and then allow or 
disallow the nuniber of points for that category onIy > In other words, even 
if part of the child's response is graronatically incorrect, he still receives 
total credit if the part of the response being evaluated for that particular 
item is correct. For e.xample, for item number five, where the category 
being evaluated is pronoun gender agreement, the response, "The girl are 
pointing to her mouth/' would receive full credit, even though there is an 
error in naxber agreement. 

The objectives of the test battery are threefold. The first is to 
identify the Amerindian child who needs special training in English versus 
the child who does not and to determine the placement of the former in the 
proper level of intensity of training in English. The second purpose is to 
provide the classroom teacher with specific linguistic information for each 
child in each language group which could be used as a diagnostic guide for 
teaching methods or materials. Potentially a third objective is to provide 
a means of assessing the relative merit of various English programs. These 
objectives require that certain decisions be made which can be classified 
as placement , diagnostic and evaluative decisions. 

Reliability and Error of Measurement 

For the test of English structure, two kinds of reliability information 
are reported, internal consistency estimates from item homogeneity (KR-20) 
and parallel forms correlation (Pearson product-moment). These are reported 
in Table 1. Since internal consistency of the test could be spuriously high 
due to the effects from speededness (see Table 15), the error of measurement 
for the English structure test has been computed on the basis of the parallel 
forms figure. Because the listening comprehension test is paced by the 
accompanying taped stimuli, the effects of speededness are negligible and 
internal consistency figures are appropriately used in estimating reliability 
and computing the error of measurement for it, as reported in Table 1. 

For the oral production test, the use of internal consistency estimates 
from item homogeneity to determine reliability must be justified, because 
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TABLE 1 
Reliabilities and Sli's 



LC 



OP* 



Written Parallel Forms 




r 


N 


M 


SD 


Form A 


A-B 


.89 


251 


34.2 


15.7 


. Fonn B 


B-A 


.90 


251 


35.7 


13.3 


Combined 




.895 




34.0 


15.5 


Form A 




■ 

KR-20 


N 


M 


SD 


Total Samnle 




.96 


291 


38.6 


15.4 


Choctaw 




.97 


38 


31.5 


17.4 


Eskimo 




.97 


53 


39.1 


16.3 


Hopi 




.95 


49 


51.6 


10.7 


Navajo 




.94 


151 


35.9 


13.3 


Forra B 




KR.20 


N 


M 


SD 


Total Sample 




.95 


281 


38.6 


14.2 


Choctaw 




.95 


39 


31.6 


14.3 


Eskimo 




.94 


51 


38.6 


13.2 


Hopi 




.90 


46 


52.9 


7.8 


Navajo 




.94 


145 


35.9 


13.0 





KR-20 


N 


M 


SD 




Total Sample 


.85 


571 


20.8 


5.7 




Choctaw 


.89 


104 


19.8 


6.6 




Eskimo 


.87 


104 


21.1 


5.9 




tlopi 


.82 


94 


24.4 


4.4 




Nava j 0 


.81 


295 


19.7 


5.2 




All Tested 4,5,6 




5,112 


19.6 


5.5 


2.15 




KR-20 


N 


M 


SD 


!!nL 


Total Sample 


.73 


182 








All Tested 




1,660 


43.5 


11.5 


5.85 



♦Numbers too small for breakdown by group. 
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the score on the oral production tost is not simply the nuacor of ite^as 
correct, but rather is the su.':^ of a set of weights given to the ite:.:s. For 
a discussion of the v^ei^hting and the justification see Section IH, It 
should be noted here that the SE^ reported in Table 1 is not exactly equal 
to sv/r^, but that it is sufficiently close to give an index for proper 
interpretation of scores for decisions involving individuals. 

For all three sections of the test the Scj.^ used in computing the per- 
centile bands is based on ail four combined language groups. Where, because 
of differences in horr^oger.eity in individual groups the size of the variance 
may be changed, the percentile bands may be somewhat more or less than two 
3c^ wide. 



Validity 

The three sections of TOPESL, English structure, listening comprehen- 
sion and oral production are intended to include a sufficient cross-section 
of the basic skills involved in understanding and speaking English to pro- 
vide content validity for TOPESL, Content validity is primarily obtained 
by having persons quite knowledgeable in the subject matter area cooperate 
in the selection of techniques and production of ite^ns, (see Section III 
for details on this aspect) but it is best demonstrated by showing how well 
the appropriate skills actually are sampled. 

In the^Lest of English structure, sentences and question reply ex- 
changes^^^afe^presented in written form and students must make judgments of 
grarunat'icality about them. The grammaticality judgments consist of pick- 
ing which of the three alternatives corresponds most closely with standard 
English. Of 86 initial grammatical categories tried out, 28 remain after 
the item validation and selection procedures on the present 62 item test. 
The items are written so that no outside information about the content of 
the item is necessary because only one response is grammatically possible. 
In general, even though particular vocabulary items may be unknown it will 
still be possible to select the correct answer from grammatical considera- 
tions alone. Because the English structure items are presented in written 
form only, there is some confounding with reading ability. 

While the emphasis in the English structure section is on the form of 
the language, the listening comprehension section emphasizes the content. 
The listening comprehension items require that a sentence or brief conversa- 
tion be understood and an answer given which requires (1) recognizing a 
pictorial representation of that content, (2) recalling part of that con- 
tent, or (3) making a simple inference on the basis of that content. The 
tasks of recognizing pictorial representation, recall and inference may be 
separate from comprehension of English, yet some response mode is neces- 
sary, and by coupling the comprehension task with three different response 
modes the effects of confounding from any particular task may be 
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reduced. Though the listening comprehension items are presented aurally, 
i.e.., both the stem and choices, some minimal reading skill is required of 
pupils to detenune which choice is being read and to u.ark their response 
on the answer sheet. 

The oral production section requires the student to be able to express 
himself sufficiently in English to be able to distinguish one picture from 
three others to the satisfaction of the examiner and to use correctly the 
different grarjp.atical categories being tested in specific itei:is. 

Construct validity for TOPESL is indicated in three ways, correlation 
with other measures which require language facility, through the inclusion 
of a subcriterion scale on the test of English structure, and through cor- 
relation of test results with aniount of English contact as estimated by 
school administrators. 

Two earlier forms of the test of English structure, used in item vali- 
dation studies, PL I and PL II, correlated as shown in Table 2 with existing 
standardized tests, teacher grades, and teacher ratings. It should be 
noted that within each information type, teacher evaluations or standard 
test scores, PL I and PL II correlated quite highly with purer language 
measures than with arithmetic scores. 

The subcriterion scale consists of 16 written test items which had a 
strong discrimination power with Amerindian children but did not with 
Anglos. Discrimination scores, expressed as "phi" [0 = (x^/n)], are meas- 
ures of the degree to which each item differentiates "top" performers in 
terms of the total test scores. The higher the discrimination score for 
any given item the better the item separates high achievers from low 
achievers. See Section III for a full discussion of the development of the 
subcriterion scale and a listing of the subcriterion items. The sub- 
criterion scale provided a basis for detennining a validity index in item 
selection for the English structure test. The assumption involved in the 
use of the subcriterion scale is that items which do discriminate among 
persons learning English as a second language, but not among native speakers, 
are true measures of knowledge of English. 

School administrators in each of the areas sampled for the norming 
administration of TOPESL were asked to evaluate the extent to which English 
was used outside of the school on the three point scale: (1) no English 
contact outside of school, (2) some, including access to television, (3) 
frequent English contact. No correlation between English contact and test 
scores were obtained for two language groups, Choctaw and Hopi, as there 
was no variation on that variable. However, data is available for 49 of 
the 54 schools tested. The significant positive correlation between English 
contact ratings and scores on TOPESL as reported in Table 3 would indicate 
that TOPESL does in fact measure English proficiency. Where English contact 
is higher, a generally higher level of English proficiency can be expected, 
and the positive correlations indicate that TOPESL was successful in indicat- 
ing this difference. 
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TABLE 5 

CcrrelatioiiS of ar.ount of English contact outside of school with TOPESL 
section scores. No values for Choctaw and liopi as there was no variation 
in English contact for those groups ♦ 

ESKIMO NAVAJO 



English Structure 
Listening Comprehension 
Oral Production 
Number of Schools 
Nuniber of Pupils 
English Contact M 
SD 

** - significant at .01 ns - not significant 



3 


4.5,6 


5 


LIA 


.55** 


.25** 


.26** 


.14** 


.44** 


.15** 


.15** 


.12** 


.48** 


.40** 


.03"S 


-.03"S 


23 


25 


26 


26 


256 


600 


1,382 


4,122 


2.25 


2.17 


2.14 


2.25 


.65 


.67 


.83 


.85 



RecotTJitended Uses 

For decisions involving individuals recommended uses of TOPESL scores 
are: (1) aiding in determining if a student is performing to potential; 
(2) individual placement; and (3) diagnosis of relative strength in various 
aspects of English. 

Determination of performance to potential is aided by determining if 
an individual scores below levels which indicate little or no knowledge of 
English. On the English structure and listening comprehension sections, 
which are both multiple choice tests, an individual may get a raw score 
somewhat above zero by chance alone, even if he doesn't know the answer to 
any question. Because of the variation of chance scores, a score must be 
several points above the average chance score, before it is truly indica- 
tive of any knowledge of English at all* These chance scores are ES 432, 
LC 399, OP 382 in standard scores. See the following paragraph for a dis- 
cussion of the origin of these figures. Where children score at or below 
these chance levels it may be assumed that they are significantly limited 
by their lack of knowledge of English and that this alone could account for 
failure to perform well in other subject areas. Conversely, a score at the 
level of performance of native English speakers would indicate that English 
was not a bar to performance in other areas. It must be noted that a low 
score on the English stnrcture section alone, would be more properly consid- 
ered as indicative of inability to read than lack of proficiency in English. 

Individual placement should be decided by a decision procedure which 
incorporates local information and local situational factors as well as 
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TOPSSL scores. Pertinent information from local sources would usually 
include grades, ratings by teachers, and scores on other standard tests. 
Local situational factors generally would detennine the nun^ber of levels 
of placement actually available and would include such considerations as 
class sizes, time available for English structure instruction, nunVoer of 
teachers conversant with English structure techniques, quality and quantity 
of English structure niaterials available. Where local decision procedures 
have not yet been established and for non- local determinatior. of the gen- 
eral level of English proficiency in groups of schools the following 
guidelines are suggested. 



RECOMNE.ND.MIONS 


ES 


LC 


OP 


AVG ST 


Intensive instruction 
in English structure 


0-432 


0-399 


0-382 


0-407 


Moderate instruction 
in English structure 


433-538 


400-525 


383-548 


408-527 


No special English 
structure instruction 


539 and 
above 


526 and 
above 


549 and 
above 


528 and 
above 



The first cutting score represents a point high enough to include 90 
percent of the chance distribution for the ES and LC sections. The second 
cutting score represents a point 3.25 SEjjj above the first cutting score for 
all thret .;ections, so that the middle section is 3.25 SE^ wide. For the ES 
section the second cutting score is also the point corresponding to the 
tenth percent?. le of native speaker performance on an earlier form of the 
test of English structure. For the OP section, since there are no chance 
scores to provide a bottom cutting score and since data on native speaker 
performance has not yet been gathered to provide a top cutting score, a band 
3.25 SEji wide was established which covers approximately the same range as 
do the ES and LC cutting scores. Generally more than one section score will 
be available. A convenient way of combining them is to take the average 
standard TAVG ST) score, the cutting points for which are also indicated. 

Diagnostic information on individuals is obtained by comparing section 
percentile scores, using the appropriate reference group. For example, of 
three Navajo fifth graders with standard scores and percentile scores of: 



Pupil 1: 
Pupil 2: 
Pupil 3: 



ES 

STD %-ile Band 



460 
566 
524 



28-50 
64-83 
50-70 



LC 

STD %-ile Band 



507 
435 
562 



32-60 
18-38 
52-82 



OP 



STD %-ile Band 



496 
513 
409 



26-67 
30-73 
11-30 



20 



15 



The first shows Quite balanced performance across t^s three English skills, 
while the seconJ 'shows possible deficiency in unde:-^anding spoken Ung .s.: 
and the third possible deficiency in producing Ln^_ V'dif^re'crbetvr^e two 
in ro'^mar-ine section scores, that the stability ot i ditiere..ce otc-..>..i tvu 

coref^; less ;h;n that of 'one score taken alone, ^^^f i^rf^i^^rfn^o" 
son of section scores should be done using the percentile bands to take in o 
account the SE-. A student's performance on two sections of the test saould 
not Se considered different unless the percentile >cores do not overlap. 

For decisions involving groups, reccun^^ended u.<e of TOFESL scores are: 
(1) diagnosis of group difficulties; (2) planning curricula; ana 
evaluation of programs. 

The source of infomiation for group diagnosis is item Jata on the ES 
section. IVhile the nuiaber of items per category not high enou^.; to give 
reliable diagnostic infoimation about individuals, useful information a.out 
group perfonnance on particular categories can be obtained f";^ ^^^J/^^' 
fistic?. Itera statistics can be used in two ways ^^^.^^"^^ '^^^e 
information provided in Tables 7 through 11 directly. Table 7 gives the 
categorization of the items on the ES section. Table 8 and 9 ar^ based on 
combined group and give rank order of difficulty (SOD), the 'ijj^^^^"^^^ 
n foercent of the sample selected for item analysts marking the correct 
a^sier to the ite.^) and an estimate of the percent of the population actually 
knowing the right answer, p'. based on distribution of responses the 
Uem. Tables !o and 11 giU p and p' for individual l^"?"!Se group • "These 
figures provide information about the relative difticulties various 
groups of the categories assessed by the ES section. For example item 
nLber 45 on form I of the ES section, which tests f"^"^"^^,^^ ^^^"f ^" 
fers. has a p' of .38 for total group. .08 for Choctaw .30 for Eskimo 
.91 for Hopi and .14 for Navajo, indicating that it ",e,^^^^"?"^S„^f ^^"""^ 
for all groups for which data is provided except the children "opi 
schools. Item number 19 on form B which examines "^o^paftive modifiers , is 
about equally difficult for all groups with p' scores of .77 for total 
group. :71 for Choctaw. .84 for Eskimo, .74 for Hopi and .73 for Navajo. 

A second use of item statistics is to comparo the performance of 
classes on test items with the performance of the appropriate norms group 
For exaiuple if a class at a Navajo school has a difficulty score of .27 for 
item number ten. which for children at Navajo schools ^ general has a dif- 
ficulty score of .89, it may be concluded that th:it particular ^l^ss has 
more tVouble with questions containing verbs with separable P^^^^^l^^^^^''^ 
"look up the word- or "look the word up") than the population ^^o^ «hich 
they come. Therefore, time should be spent teaching such construction*, 
calculation for this kind of use of item information are extremely imp le. 
and only involve determining the percent of persons fl«c^;i"8/he right 
answer lo each question. This is then compared he P cor orted for 
that item for the appropriate reference group. Since tnese rigures 
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grades four, five and six combined, classes which are all fourth or fifth 
or sixth may be expected to differ systematically froni these, in that fourth 
graders score lower and sixth graders higher. Because there is sone chance 
fluctuation in item statistics, differences between class difficulty scores 
and the reported difficulty scores should be greater than .20 to be consid- 
ered meaningful. 



U se of Norms Tables 

Norms tables are provided for individual language groups, Choctaw, 
Eskimo, Hopi, Navajo and, in addition, for combined groups, for grades 
three, four, five and six* The tables are arranged so that conversion from 
raw to standard scores can be accomplished at the same time that percentile 
scores and percentile bands are obtained. Numbers on which tables are 
based, means and standard deviation are given at the bottom of each* 

The score conversion is based on combined scores for all fourth, fifth 
and sixth graders tested* The mean of the standard scores is 500 and the 
standard deviation is set at 100* There were two reasons for choosing this 
score conversion scale. First, it has been in use for sometime, and many 
people are fa-niliar with it* Second, because the standard scores vary 
around 500, they are not easily confused with IQ scores* Through the use 
of both standard scores and percentile scores by group, two simultaneous 
comparisons are allowed* The percentile scores for each group show rela- 
tive standing for that grade and group, and the standard scores show 
whether the individual is above or below average for all children tested 
in grades four, five and six irrespective of grade* For example, a Hopi 
sixth grader with a raw score of 49 on the ES test would be almost one SD 
above average (standard score 594 is mean of 500 + *94 SD) for all children 
tested, and yet in the 35th percentile of his reference group* This means 
that 65 percent of his reference group scored higher than he did* Similarly 
an Eskimo fourth grader with a raw score of 19 cn the LC test would be 
barely average for all children tested (standard score of 489 is mean of 
500 less *11 SD) and yet in the 68th percentile for his reference group 
which means that only 32 percent of his reference group scored higher on 
the LC test* 

The combined group tables are the recommended reference for all groups 
not sampled and for groups sampled, but for whom no percentile data are 
given because the numbers of persons on whom data was available was too low 
to compute percentiles* Specifically, there are no percentile data given 
for Choctaw and Hopi on the OP section* 

Percentiles based on fewer than 200 persons are apt not to accurately 
reflect the true distribution of ability within the reference group, and so 
should be used with caution* This advisement applies to OP data for Eskimo 
schools, to all data for Hopi and Choctaw schools. Though the percentile 
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data for these groups oust be used advisedly, they are provided here as 
thev do give an idea of general ability level and distribution within the 
groups in Fail 1970 when data for the noms tables was obtained. 

To provide an additional basis of comparison, it is recorjnended that 
norms based strictly on local groups be established. This is strozigly 
advised for all groups for which no norm data are provided here and for 
groups for which the data given here are based on considerably fewer than 
200 persons. The development of norms for local groups is a straight- 
forward procedure which involves calculating midpoint percentiles fron 
frequency distributions of scores of groups of 200 or more. 

As an example of the use of the norms tables, consider a fifth grade 
Eskimo who had the following raw scores, ES 39, LC 18, OP 49. Looking in 
the norms tables for Eskimo schools for fifth graders, it's found that 
these raw scores correspond to standard scores of ES 524, LC 471, OP 548 
and percentile bands of ES 31^52, LC 22-36, OP 34-83. Since these bands 
all overlap somewhat, it cannot be concluded that this student's abilities 
differ on the skills assessed by the various sections of TOPESL, 



Description of the Noras SainPle 

The population for the norms consisted of all Amerindian children in 
grades three, four, five and six in school on the ten days following the 
twenty-fifth day of instruction in the Fall of 1970, in the schools selected. 
Schools were selected on the basis of a stratified sampling schema which 
took into consideration: whether schools were boarding or day, school size, 
school accessibility, language group of school population, and availability 
of teachers for workshops held in Summer 1970. These workshops were given 
to train teachers in the administration and scoring of the OP section of 
TOPESL. Because only schools with Choctaw, Eskimo, Hopi and Navajo speakers 
were selected and because consideration of availability of teachers was 
involved, the sample cannot be said to be strictly representative of the 
total population of BIA schools. However the 6,977 pupils tested constitute 
43 percent of the 16,040 enrolled in Bureau schools in grades three, four, 
five and six in 1970, and of the 37 listings by tribe in Statistics Concern^ . 
ing Indian Education , the four language groups tested account for 83 percent | 
of the total BIA school population. • 

For purposes of description, schools participating in the norming 
administration were classified into the following categories: 

Total Enrollment: S sizes - 0-74 15-149 150-299 300-600 600^ 
Accessibility: 3 degrees - remote difficult easy 

English Contact: 3 degrees - none some frequent 

The sampling of schools within the five size strata is indicated in Table 4. 
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TABLE 4 

Comparison of School Sizes in Sample 
and in BIA School Population 





No. in Population 


No. 


in Sample 


































Percent 


Size 


Boarding 


Day 


Total 


Boarding 


Day 


Total 


Sampled 


0-74 


8 


75 


83 


0 


12 


12 


.145 


75-149 


7 


34 


41 


1 


6 


7 


.17 


150-299 


17 


21 


38 


4 


7 


11 


.29 


300-599 


17 


4 


21 


6 


1 


7 


.33 


600* 


28 


4 


32 


15 


2 


17 


.53 


Totals 


77 


138 


215 


26 


28 


54 


.25 



As can be seen from Table 4 large schools are over-represented in the sample 
obtained, with a corresponding under-representation of small schools. 
Because the correlation of school size with test scores varies for different 
language groups, as reported in Table 5, that table should be examined to 
detenaine the effect of this over-representation of large schools, in par- 
ticular cases. 



TABLE 5 

Correlation of School Size with TOPESL Section Scores 



* of « of Mean SD 
LC OP Schools Pupils Size Size 



Choctaw 3 .55** .06"^ .39** 2 68 4.2 .96 

Choctaw 4,5,6 .29** .13"S -.34** 2 196 4.2 .98 

Eskiir.o 3 .22** .40** .43** 23 236 2.3 1.2 

Eskimo 4,5,6 .22** .14** .31** 23 600 2.2 1.1 

It'opi ■ No Variation in School Size 

Navajo 3 .13** .08* .07* 26 1,382 4.5 .63 

4 .07* .04"S -.01"S 26 1,502 4.5 .63 

5 .12** .08* -.06* 26 1,344 4.5 .63 

6 .05ns .08* -.19** 26 1,310 4.7 .45 



ns not significant * significant .05 ** significant .01 



2/j 



Table 6 provides a complete listing by number of the constitutions of 
the norms sample considering the characteristics: school size, school 
accessibility, English usage, grade, sex, age and language group. The 
schools which participated in the norming administration are listed in 
Appendix II. 
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TABLE 6 
Numbers in Sample 







1 

X 


School Size 
2 5 4 


c 


School Access. 
1 2 3 


3 




71 


96 


210 


592 


833 


523 


856 


423 


4 


1,894 


74 


99 


189 


524 


1,008 


500 


936 


458 


5 


1,648 


52 


99 


141 


474 


882 


397 


796 


455 


6 


1,633 


65 


57 


121 


428 


962 


393 


726 


514 




6,977 


262 


351 


661 


2,018 


3,685 


1,813 


3,314 


1,850 




Grade 


English Usage 
1 2 3 


M 


Sex 

F 


Choc. 


Language 
Esk. Hopi 


Nav. 


3 


424 


636 


742 


914 


885 


68 


236 


110 


1,388 


4 


440 


579 


875 


829 


1,001 


78 


232 


81 


1,503 


5 


377 


497 


774 


849 


793 


52 


189 


57 


1,350 


6 


306 


469 


858 


783 


846 


66 


179 


76 


1,312 




1,547 2 


,181 


3,249 


3,375 


3,525 


264 


836 


324 


5,553 



Grade 


8 


9 


10 


11 


12 


13 


14 


3 


352 


792 


404 


136 


31 


7 


10 


4 


24 


272 


832 


458 


166 


42 


13 


5 


3 


7 


208 


749 


455 


142 


47 


6 


-> 


1 


11 


216 


751 


£45 


179 




561 


1,072 


1,455 


1,559 


1,403 


656 


249 
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CLASSIFICATION OF WRITTEN IT^MS 





Form A 


Form B 




V. Verb 

1. Auxin iary 

a) Agreement w/reply 

b) Do in question 

c) Replacive 

d) Modal 

e) Passive 

2. Complement 

a) Infinitive 

b) Inf. without to 

c) Gerund 

3 Ordpr in ni.'pQtinn unfh 

separable particle 
4. Tense 

a) Agreement w/adverb 

b) Past as conditional 

c) Sequence of 


A23, A26, A37, A57 

A5, A48 

A45 

A27 

A20, A42. A60 
A3 

A32, A41 
A36 

A7, AlO, All 

A33, A49, A54, A56, A59 

A31, A46, A62 

A2 


B23, B24, B46 
B14 

B16, B41 , B47, B52 
Bll 

B30, B36, B62 

B9 

B18, B28 
A37 

B12, B20 

B34, 844, 855, 856, 860 
817, 849, 822 

850 


\ 


M. Modifier 

1 . Comparative 

2. Adjective 

a) Selection 

b) Numeral 

«j . r\u V c 1 u \CL 1 

a) Negative 

b) Intensifier 

c) Frequency 

d) Time 

e) Locative Phrase 


A21, A22, A30, A34, A38 

A8 
A52 

A9 
A12 

A15, A44 
A17 

A53 


825, 838, 819, 827,815,351 

84 
843 

B26 

BIO 

845 

B5 

B54 


H . Nomi nal 

1 . Relative pronoun 

2. . Direct object pronoun 

3. Reflexive 

4. Possessive pronoun 

5. Noun selection 


• 

1 

A4, A13, A18 
A14, A35 
A24 

A47, A50, A55, A58 
A29 


81, 86, B42 

840 

B2 

831, 848, B33, 861 
B3 


C. Conjunction 

1. Coordinate 

2. Subordinate 


Al , A6 

A40. A43, A61, A16. A19, 

A39 


C8, B53 

829, D39, 859, 853, B35, 


Q. Question reply 


A25, A51 


B7, 357 


W. Word order in relative 
clause 


A28 


B13 
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•Matched 
Item 


ro ri- ts^ O f>^ ^ 1— CVJ a\ CO VO O VOOkr- CO CO ^ IT) CO vo •^^^^J^ 
COCOCOCQC3C3CDCQCOCOCOCO CQCOCO cocococococococo mcococQco 


ROD 


CVJCVJnrOCVjrOi— r^C\JC\JC\Ji— r— C\Jr- r^r^r^ CM CM f— »— 


•r— 


o Ln^r^c^r^u^c7k^a>roc%«c'U^or^covooicocMCCocx3^r>^CMOf^Of^ 




voooor^inoinr-oco««^<T»r-c*<T>or^cccorocoor-«j-cr>r- ^<-oor^ 

<0^^ir)^^rOCM^^^»i^CMinCMCMCOCOCMCMCMVOt^^fOrO^COOCM^ 


c 


cor^cor^»--cor^r>^r*r^«roDio^f^r^ocMccror^ir)u:?ocococor-coco 


Category 




Item if 


CMco•!3'lnvo^^coo^Or-c^Jc^<"tnvor^coc^Or-CM^^^lnvor^ooo^Or--c^J 
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 


Matched 
Item 


CO O CTi 1— ^ CO "^OOCMOVO tniDlOCMr-OlOCOnCMr^iDr-COnOr^ 
ir> r-inX CMCMr-r- X CO ^ IT) n CM CO CM ^ r- 

COCOCOCOCOCO CQCOCQ COCO CO CQCaCQCQCQCQCQCQCQCQCQCQCQCQCQCQCQ 




ooir>iotr;uo^^^OLnroLntr^co<"^uoir)rocMCsjro«3-^rotn 


•r— 


c^r— ou^vc^cercMr— i-~roocMLntr)OCOLn<d-OOCMLnLn*^ 
ro ro ^ <:7 CO lo u*^ CO «c ir^ Csj CO uo <c CO in 1^ 




r--«;rvcro^r>jcocMo^oiocctn^t-oCMr— r^r— ou^roco^ocotnocTkC^ 




c:> u"^. ^ r-- o o o cc o cc ro r-- o c^ t-o "c- ^ c: o 


Category 


— <^-r\;r-r— r-rOCMrOCOrOrOr-CMr^CMPOr— CMr-r— r— ^rO »— UDr-^ 

» _ _ 


a 


f— cM(^^ir>or«^coc7^o*--CNjrO'?rtJ^or^cooOr— cMro*^•tr>o^^cDoOr- 
r-►l--/~r-►f--r--l--r-^f--,-.;^JCM^^Jc^JCMC^JCMc^^cMCM^oro 

<<<<<<<<<<<<<<<<<<<<<<<<<:<<<<<< 
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Matched 
Item 


<<<<<<<<< <<<<< <<<< ^^^^^ <<<<: 


ROD 


COCVJCOCOCNJCSJCOr— CNJr-rOr— r-r-r— CVJCVJr— r— C\J r— 


•r" 

JO 
C 
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Norms Table s 

The following 20 pages present the raw to standard score conversions, 
midpoint percentiles and percentile bands for Combined Group, and for 
Choctaw, Eskimo, Hopi and Navajo schools* The English structure, listening 
comprehension and oral production percentiles are listed together by grade. 
The English structure section is referred to in the tables as "Written," 
Abbreviations used are: Stnd for Standard; Band for Percentile Band; and 
Mid for Midpoint Percentile, An asterisk (♦) indicates that the percentile 
scores are either less than 1 percent or greater than 99 percent, appropri- 
ately as it occurs at the top or bottom of the table. 

The number of pupils on whom the percentile figures are based, the 
mean and standard deviation for each group are given at the bottom of the 
tables. . 
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SECTION III 



Sections I and II of this interpretive manual consider the use of test 
results in decision making and provide information necessary to interpret 
scores^ Section III provides a discussion of test technique selection and 
item validation studies, and also gives information on the calculation of 
the statistical data. 



Test Technique Selection 

The basic goals of the test development project were formulated in 
the summer of 1968, when a series of planning meetings, attended by ELTP 
staff, BIA staff and project consultants were held. At these meetings it 
was decided that primary importance should be given to developing a test 
of English proficiency for grades four, five and six. These grades were 
selected on the basis of data presented at the meetings about student moti- 
vation and attendance patterns in BIA schools, namely: interest in school 
begins dropping by fourth grade, there are motivation problems by sixth 
grade, and a 20 to 30 percent drop-out rate by eighth grade. 

The goal of the development effort was to be a test of English pro-^ 
ficiency which assessed skills in both production and perception of English, 
and which would be ''culture fair." On the basis of potential content va- 
lidity various techniques were suggested to be used in eliciting production 
and perception responses exhibiting those skills considered necessary for 
speaking and understanding English. 

The following techniques were evaluated as to how well they could 
assess English production: description of pictures to a rater, reading a 
paragraph, telling a story suggested by a series of pictures, role playing, 
interviewing, description of an object presented for manipulation, imita- 
tion of given sentences, transformation of given sentences into their 
corresponding passives or negatives, word association following direction, 
and self-ratings. From these techniques, picture description, repetition 
and transforniation of given sentences were selected for field testing. 

It was decided to examine the perception of English by using both 
written and auditor)' (spoken) stimuli. Techniques considered for use in 
written form included: question and short answer, sentence completion, 
combining two simple sentences into a complex one, breaking a complex sen- 
tence into two simple ones, selecting transformations of a given sentence, 
and correctly ordering sentences presented in scrar.bled form. The format 
finally chosen for field testing of written perception consisted of multiple 
choice items of the short answer and sentence completion t>7>e. The items 
for the written forn* wore to be written to con:pletely cover a list of 
grarjnatical categories selected by linguists as being basic to communication 
in English. 
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Techniques considered for use in spoken form for examining perception 
of English were: distinguishing minimal phonetic pairs, identifying a 
picture described by a test sentence, inferring where a presented conver- 
sation took place, and recalling information presented in a conversation. 
All of these except minimal pair discrimination were selected for field 
testing. 

It was also decided that, along with field testing of techniques, 
external criteria adequacy and validity studies should be carried out. 
Field testing attempted to find out (1) what kinds of things people respond 
to and (2) what kinds of things yield responses that have a varying range. 
For external criteria adequacy, other measures that might show English 
ability, i.e., achievement test scores, IQ scores, attendance records, 
grades, reading rates, teacher ratings and self-image ratings, were sug- 
gested. . External criteria adequacy studies were necessary to answer the 
question how well do these other measures reflect knowledge of English. 
Measures finally selected were grades, teacher ratings and achievement test 
scores. Validity studies were then necessary to find what, if any, cor- 
relation existed between scores from the new techniques and ratings from 
the external criteria. 

Field testing of the selected new techniques was carried out both 
in BIA schools with native speakers of Amerindian languages, and in 
Los Angeles schools with native speakers of English. Not only the target 
grades, four, five and six, but also grades three and seven were tested, 
to provide bracketting information about proficiency. At the end of the 
first year techniques had been validated through field testing, and through 
studies to determine correlations of test scores within each of the pro- 
spective techniques. A summary of the figures from the correlation 
studies has been given in Table 3. 

Once techniques had been validated, it remained to improve the con- 
sistency of the individual test sections by subjecting each technique to 
item analysis for item validation. Schools participating in item valida- 
tion studies in Fail 1969 and Spring 1970 are listed in Appendix I. The 
written items for the English structure section, were validated in two ways, 
against total test score and against a subcriterion score. The subcriterion 
score was the number correct on 16 items selected on the basis of relatively 
high discrimination scores among Amerindian children speaking English as a 
first language. Table 12 provides a listing of the categorization of the 
subcriterion items, and the difference in discrimination power among the two 
groups. The discrimination scores using total test scores and subcriterion 
i scores were conibined with item difficulty data to produce indices of item 

validity and item reliability, using the formulas: ij. = index of reliability 
= rg t Sg; iv = index of validity = rg c Sg; where rg t is the correlation 
of lien score with total test score, rg ^ is the correlation of item score 
with subcriterion score and Sg is the item standard deviation, equal to^'pg. 
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The indices of validity and reliability were used in item selection follow- 
ing a technique outlined by Gullicksen (1950, p. 383), in which items are 
chosen on the basis of high index of validity and the ratio of ir/iv. 

In setting up the two parallel forms of the English structure section, 
items were paired according to both their index of validity and index of 
reliability, so that for every item assigned to Form A there was an equally 
valid and reliable item assigned to Form B. From 256 items representing 
the 86 grammatical categories initially chosen as essential to proficiency 
in English 124 items representing 29 grammatical categories were finally 
selected. 



TABLE 12 

Categorization of Subcriterial Items 



Verb: Auxiliary, replacive, "are too" .25 

Conjunction: Subordinate, '*so*' .25 

Modifier: Comparative, "different from" .24 

Nominal: Derived noun, "friendship" .22 

Conjunction: Subordinate, "so" .22 

Modifier: Comparative, "the more. . .the. . ." .21 

Modifier: Comparative, "the more. . .the. . ." .19 

Verb: Auxiliary, replacive, "do so" .17 

Modifier: N'egative word order, "did not" .17 

Verb: Separable, "write down" .16 

Conjunction: Subordinate, "wish that" .16 

Modifier: Comparative, "as. . .as. . ." .14 

Nominal: Possessive pronoun, "you. . .your" .13 

Nominal: Relative pronoun, "place. . .where" .12 

Verb: Agreement with adverb, "before. . .ing" .12 

Modifier: Comparative .11 

Median 0i - 02 -^^ 
Mean 0i - 02 



The listening comprehension items were validated in two separate 
sections: a picture section and a conversation section. Picture items 
were selected on the basis of item correlation with total score on the 
picture section. Conversation items were selected on the basis of item 
correlation with total score on the conversation section* Of 19 picture 
itcns, 14 were selected; of 25 conversation items, 16 were selected for 
inclusion in the final LC section which contains both picture and con- 
versation items . 
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The present scoring system for the Oral Production makes adequate de- 
scription of the target picture a requirement for accepting the response 
for scoring. Once the response is judged by the examiner to be an adequate 
description of the target picture, it is then evaluated for grairanatical 
correctness. For each item one or two grammatical aspects have been se- 
lected for evaluation, and correctness is determined for these aspects only. 
Making adequate description a preliminary criterion, and picking the par- 
ticular grairanatical aspects to be scored for each item were the final re- 
sult of several stages of development. 

In the initial stages, all responses were tape recorded and analyzed 
later. The first scoring system rated plus or minus on four factors : 
intelligibility--whether the response could be understood; description-- 
whether the response adequately described the target picture; grairamati- 
cality— whether there were any grammatical errors in the response; vocabu- 
lary--whether there were any vocabulary errors. Sub-scores on these 
factors were included in the coixelation study of techniques and criteria. 
Results from the study showed that only description and grairanaticality were 
consistently positively correlated with other measures of English profi- 
ciency. The correlations were strongest with gramma ticality. As a result, 
I in the second stage of the development of a scoring system for the OP 

i section, an expanded classificatory system was worked out which allowed 

' errors to be broken down into four categories: grammar, pronunciation, 

description and vocabulary. The description and vocabulary categories were 
not further broken down, the grammar category was divided into errors con- 
cerning nouns, errors concerning verbs, and errors of complexity. Errors 
concerning nouns were further divided as number, gender, absence of noun, 
determiner and other. Errors concerning verbs were further divided into 
errors of number, absence of verb, tense and prepositions. Errors in com- 
plexity were those where the two required ideas were* combined through 
* simple conjunction (using "and")* were given in two simple sentences, or 

were coupled with ''when". Pronunciation errors were broken into errors 
concerning consonants, vowels and errors of fluency. Fluency was defined 
as starting the response again, two or more times. The matrix used is 
given in Table 13. 
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TABLE 13 



GraiTunar 



Nouns 



cr 
o 
•i 



> 
cr 
(/) 

O 



O 

(/) 

H» 

o 

3 



Verbs 



cr 



> 
cr 
w 

3 
O 



3 



o 

w 

o 

3 



Complexity 



C/3 
3 

o 
o 

3 



3 
rt 

3 
O 



n 
o 

3 



3 



Pronunciation 



< 
o 
«: 



n 
o 

3 
W 

o 

3 
P) 
3 



c 

3 
O 

v; 



Description 



Vocabulary 



Five hundred children were tape recorded taking the OP test in Fall 
1970. The tapes were returned and scored using the matrix. The results 
were submitted for standard item analysis. On the basis of the results it 
was decided to make adequate description an absolute criterion for accepting 
a response for further evaluation. Once a response is accepted as being 
adequately descriptive, it is evaluated only as to whether it contains what 
was the most powerfully discriminating error for that set of pictures. The 
last eight pictures, are scored for complexity, in addition to the category 
selected by the item analysis procedure. After the errors to be scored 
were selected, they were weighted from one to four points per item. The 
weights were assigned on the basis of the biserial correlation coefficient 
of item score with OP test score. The biserial correlation was computed on 
only the top and bottom 27 percent of each group, with the middle 46 percent 
of each group omitted. 

Because of the weighting of the items, the score on the OP section is 
not a straight linear function of the number of items on which no error was 
made* Since standard formulas giving estimates of test reliability require 
that score be a linear function of number of items answered correctly, and 
since the question of scorer reliability arises with a test of production, 
the reliability figure reported for the OP section does not have the same 
meaning, as the figure for the ES and LC sections. In an attempt to account 
for both internal consistency and scorer reliability, the geometric mean of 
an internal consistency estimate (KR-2C) and a scorer reliability figure 
(product-norent) \>as taken. The KR-20 estinate^is .728 the figure frorri the 
rater reliability study is .755. Their geometric mean, .74, was used to 
con.pute the SE for the OP section. 
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The.KR-20 formula was used as the source for the internal consistency 
figure used to calculate the geometric mean, even thouJS o^^ of the assump- 

iarvio?ated' ^i'-'^'^'y °f score and number of ?t ems correct) 

was violated. An estimate of the magnitude of error resulting from this 
violation must be attempted. Gullicksen (1950, p. 326) gJes a formula 
developed by Wicks which approximates the mean value of the correl^ion 

between two weighted composites, ^ = i . J_ ffSv>2 . ,Swx2 * 
is the mean value of the correlation between the weighted composites, r is 
the average intercorrelation between the variables being combined, ^ . 
K is the number of variables being combined. and are the means" of 
the two systems weights, and and are the standard deviations of the 
two systems of weights. For the unweighted case = 1 and = o. If 
both are unweighted R becomes 1. The question is how much does the set of 
weights used with the OP section change this value. The mean of the OP 
weights is 2.86 and the SD is 3.06. So with the OP weights R becomes 

1 - 1±1± ' Since there are 21 items, this becomes 1 - .027 . Even if t 
2rK — =— 

r 

were as low as 0.3, R would still be 1 - .09 or .91. 

tot.i^^o'"'""''\'".n^^'^^'^''^ " ^^^^^ °" ^J^e correlation of 

IT.l 1,1 °" °' ^■"'^Ses rated each of five subjects 

twice, and four judges rated each of ten subjects once, for a total of 

200 pairs of ratings. The resultant figure of .755 thus reflects boJh 
inter and mtra-judge reliability. 

Test Inter-Correlation 

Inter-correlations among parts of the test are as given in Table 14. 
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TABLE 14 

Inter-Correlations of Sections of TOPESL 



GROUP 


ES/LC 


ES/OP 


LC/OP 


Choctaw 


.71 


.45 


.47 


Eskimo 


.56 


.51 


.40 


Hopi 


.55 


.53 


.42 


Navajo 


.53 


.51 


.43 


Combined 


.56 


.50 


.43 



groups 



Parallel Forms Data 

Five hundred and two students took both Forms A and B of the ES 
section* All groups took the second form within five days of taking the 
first* Exactly half had Forra A first and half had Form B first. The cor- 
relation with A first, r^ g was .890, the correlation with B first, r^ ^ 
was .900* Mean raw score on Form A 35.4, Form B 34.8. Standard deviation 
of Form A was 13.7, the standard deviation of Form B, 13.3. 



Combined Scores for Forms A and B of the ES Section 

A preliminary sample of 538 persons, of whom 269 took Form A, and 
269 Form B, was drawn to determine if the population mean for Form A was 
different than Form B. The means, SD's and critical ratios for the sample 
were as follows: 



M SD 



Form A 


54.84 


16.28 


Form B 


33.70 


15.80 


Difference 


1.14 


.48 


Critical Ratio 


.83 


.49 


Probability 


.40 


.60 



On the basis of this sample it was concluded that the population means and 
SD's for Form A and B were not different. This conclusion was born out by 
correlation information on the total norms sajaple of 6,771. The correlation 
of ES score with form was + .003. 



Speededness 



Speededness data were collected only for the ES section of TOPESL as 
a fornsat for the LC and OP sections precludes any strong speededness effects. 
Percentages of persons reaching item 46 (3/4 finished) , and 62 (end of test) 
are indicated in Table 15. Figures are based on the item analysis sample 
of 795 persons. 



TABLE 15 





% Reaching Item Number 




Form A 


Form B 


Group 


#46 #62 


#46 #62 


Choctaw 
Eskimo 
Hopi 
Kavajo 
Combined 
groups 


.47 .31 
.89 .68 
.98 .94 
.80 .66 
.81 .66 


.64 .33 
.88 .74 
.98 .93 
.83 .55 
.84 .62 



Item Statistics 

Item statistics based on the item analysis sample for combined groups 
are presented in Tables 16 and 17. Reported are p, p\ r , , and item 
categorization for each item on the written test. TablesMS and 19 give p 
and p' for individual language groups. 



Research with TOPESL 

Subsidiary information collected during the norming administration 
included data on school size, school accessibility, English contact, sex 
differences, and grade and age phenomena. Correlations of English contact 
with scores, and of school size with scores given in Section II of the 
manual, in Tables 3 and 10. 

No significant correlations were found for accessibility. Sex data are 
available for the Navajos only. These are reported in Table 16. 
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The relative influence of grade and age on section scores can be seen 
in Table 17. Though there is a positive correlation between both age and 
grade, when grade is partialed out of the age-score correlations, they 
become strongly negative, indicating that within the same grade, older 
children tend not to do as well as younger ones. 



TABLE 16 

Correlation of Sex and Section Scores for Navajo Students 





ES 


LC 


OP 


Grade 


r 


SD 


N 


r 


SD 


N 


r 


SD 


N 


4 


.16** 


12.2 


1,497 


.03"S 


5.0 


1,483 


.13** 


13.2 


475 


5 


. 18** 


13.1 


1,339 


.06"^ 


5.7 


1,327 


.05"^ 


11.4 


440 


6 


.15** 


13.1 


1,302 


-.03^-s 


4.9 


1,300 


.11* 


10.7 


418 
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TABLE 17 

Correlations of Age, Grade and Age--Grade Partialed Out 
With TOPESL Section Scores 



Choctaw 4,5,6 


ES 


LC 


OP 


M 


SD 


Age 


.26 


. .12 


.32 


11.0 


1.3 


Grade 


. 50 


.04 


.51 


4.9 


.9 


Age (Grade Out) 


-.17 


-.12 


-.07 






Eskimo 4,5,6 












Age 


-.04 


.01 


-.23 


11.0 


1.3 


Grade 


.36 


.23 


.11 


4.9 


.8 


Age (Grade Out) 


-.41 


-.20 


-.42 






nopi H , 0 , 0 












Age 


.10 


.21 


.08 


10.7 


1 2 


Grade 


.27 


.35 


.31 


5.0 


.9 


Age (Grade Out) 


-.18 


-.13 


-.28 






Navaio 4 












Age (Grade Out) 


-.04 


-.02 


-.06 


10.4 


1.0 


Navajo 5 












Age (Grade Out) 


-.09 


-.07 


-.13 


11.4 


.9 


Navajo 6 












Age (Grade Out) 


-.17 


-.15 


-.25 


12.4 


.9 
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APPENDIX I 



BIA Schools 



Barrow Day School 
Barrow, Alaska 
F 68, Sp 69, F 69 

Choctaw Central School 
Philadelphia, Mississippi 
F 68, Sp 69, F 69 

Hopi Day School 
Oraibi, Arizona 
F 68, Sp 69, Sp 70 

Many Farms Elementary (Navajo) 
Chinle, Arizona 
F 69 

Salt River Day School (Pima) 
Salt River, Arizona 
Su 68 



Chinle Boarding School 
Chinle, Arizona 
F 69, Sp 70 

Chuska Boarding School (Navajo) 
Tohatchi, New Mexico 
Sp 69 

Leupp Boarding School (Navajo) 
Leupp, Arizona 
F 69 

Oglala Community School 
* Pine Ridge, South Dakota 
F 68, Sp 69 



Los Angeles Schools 



Beethoven Street School 
Los Angeles, California 
Su 69 



Harrison Street School 
Los Angeles, California 
Su 68 



Richland Avenue School 
Los Angeles, California 
Su 68 



San Jose Street School 
Los Angeles, California 
Sp 68 



Fall 



Sp = Spring 



Su = Summer 
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APPENDIX II 



CHOCTAW 

Mississippi 

Conehatta Boarding School 
Conehatta, Mississippi 39057 



Choctaw Central Boarding School 
Philadelphia, Mississippi 39350 



ESKIMO 
Alaska 



Akiachak Day School 
Akiachak, Alaska 99551 

Brevig Mission Day School 
Brevig Mission, Alaska 99785 

Chifomak Day School 
Chifornak, Alaska 99561 

Garabell Day School 
Gambell, Alaska 99742 

Hooper Bay Day School 
Hooper Bay, Alaska 99604 

Kasigluk Day School 
Kasigluk, Alaska 99609 

Kotlik Day School 
Kotlik, Alaska 99620 

Mountain Village Day School 
Mountain Village, Alaska 99632 

Nunapitchuk Day School 
Nunapitchuk, Alaska 99641 

Savoonga Day School 
Savoonga, Alaska 99769 

St ebb ins Day School 
Stebbins, Alaska 99671 

Unalakleet Day School 
Unalakleet, Alaska 99684 



Barrow Day School 
Barrow, Alaska 99723 

Chevak Day School 
Chevak, Alaska 99563 

Elim Day School 
Elim, Alaska 99739 

Golovin Day School 
Golovin, Alaska 99762 

Kalskag Day School 
Kalskag, Alaska 99607 

Kiana Day School 
Kiana, Alaska 99749 

Kotzebue Day School 
Kotzebue, Alaska 997S2 

Napakiak Day School 
Napakiak, Alaska 99634 

St. Michael Day School 
St- Michael, Alaska 99769 

Shaktoolik Day School 
Shaktoolik, Alaska 99771 

Tuntutuliak Day School 
Tuntutuliak, Alaska 99680 
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APPENDIX II (contO 



llOPI 

Arizona 

Hopi Day School 
Oraibi, Arizona 86039 

Second Mesa Day School 
Second Mesa, Arizona 36043 



Polacca Day School 
Polacca, Arizona 86042 



NAVAJO 
Arizona 



Chinle Boarding School 
Chinle, Arizona 86505 

Dcnehotso Boarding School 
Kayenta, Arizona 86033 

Greasewood Boarding School 
Ganado, Arizona 86505 

Kayenta Boarding School 
Kayenta, Arizona 86033 

Lukachukai Boarding School 
Lukachukai, Arizona 86505 

New Cottonwood Boarding School 
Chinle, Arizona 86503 

Shonto Boarding School 
Tonal ea, Arizona 86044 

Toyei Boarding School 
Ganado, Arizona 86505 



Chrystal Boarding School 
Fort Defiance, Arizona 87504 

Dilcon Boarding School 
Winslow, Arizona 86047 

Kaibeto Boarding School 
Tonalea, Arizona 86044 

Leupp Boarding School 
Leupp, Arizona 86035 

Many Farms Elementary School 
Chinle, Arizona 86505 

Red Lake Boarding School 
Tonalea, Arizona 86044 

Teecnospos Boarding School 
Teecnospos, Arizona 86514 

Tuba City Boarding School 
Tuba City, Arizona 85045 



New Mexico 

ChusKa Boarding School 
Tohatchi, K. Mex, 87525 

Dzilth-Na-O-Dith-hle 
Bloomfield, K. Mex. 87415 



Crownpoint Boarding School 
Crownpoint, N* Mex, 87513 

Nenahnezad Boarding School 
Fruitland, K. Mex. 87301 
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APPENDIX II (cont.) 



NAVAJO 
New Mexico 

Sanostee Boarding School 
Little Water, N. Mex. 87420 

Toadlena Boarding School 
Toadlena, N. Mex. 87324 

Kingate Elementary School 
Fort Kingate, N. Mex. 87316 



Shiprock Boarding School 
Shiprock, N. Mex. 87420 

Tohatchi Boarding School 
Tohatchi, N. Mex. 873 25 



Utah 



Aneth Boarding School 
Aneth, Utah 84S10 
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